AITopics | simmc 2

Collaborating Authors

simmc 2

Information about AI from the News, Publications, and Conferences

Automatic Classification – Tagging and Summarization – Customizable Filtering and Analysis

If you are looking for an answer to the question What is Artificial Intelligence? and you only have a minute, then here's the definition the Association for the Advancement of Artificial Intelligence offers on its home page: "the scientific understanding of the mechanisms underlying thought and intelligent behavior and their embodiment in machines."

However, if you are fortunate enough to have more than a minute, then please get ready to embark upon an exciting journey exploring AI (but beware, it could last a lifetime) …

'What are you referring to?' Evaluating the Ability of Multi-Modal Dialogue Models to Process Clarificational Exchanges

Chiyah-Garcia, Javier, Suglia, Alessandro, Eshghi, Arash, Hastie, Helen

arXiv.org Artificial IntelligenceJul-28-2023

Referential ambiguities arise in dialogue when a referring expression does not uniquely identify the intended referent for the addressee. Addressees usually detect such ambiguities immediately and work with the speaker to repair it using meta-communicative, Clarificational Exchanges (CE): a Clarification Request (CR) and a response. Here, we argue that the ability to generate and respond to CRs imposes specific constraints on the architecture and objective functions of multi-modal, visually grounded dialogue models. We use the SIMMC 2.0 dataset to evaluate the ability of different state-of-the-art model architectures to process CEs, with a metric that probes the contextual updates that arise from them in the model. We find that language-based models are able to encode simple multi-modal semantic information and process some CEs, excelling with those related to the dialogue history, whilst multi-modal models can use additional learning objectives to obtain disentangled object representations, which become crucial to handle complex referential ambiguities across modalities overall.

machine learning, natural language, object-oriented architecture, (19 more...)

arXiv.org Artificial Intelligence

2307.15554

Country:

North America > Dominican Republic (0.04)
Europe > United Kingdom > Scotland > City of Edinburgh > Edinburgh (0.04)
Oceania > Australia > New South Wales > Sydney (0.04)
(9 more...)

Genre:

Personal > Interview (0.83)
Research Report (0.70)

Technology:

Information Technology > Artificial Intelligence > Machine Learning (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning > Object-Oriented Architecture (0.49)
Information Technology > Artificial Intelligence > Natural Language > Discourse & Dialogue (0.49)

Add feedback

Exploring Multi-Modal Representations for Ambiguity Detection & Coreference Resolution in the SIMMC 2.0 Challenge

Chiyah-Garcia, Javier, Suglia, Alessandro, Lopes, José, Eshghi, Arash, Hastie, Helen

arXiv.org Artificial IntelligenceJul-26-2023

Anaphoric expressions, such as pronouns and referential descriptions, are situated with respect to the linguistic context of prior turns, as well as, the immediate visual environment. However, a speaker's referential descriptions do not always uniquely identify the referent, leading to ambiguities in need of resolution through subsequent clarificational exchanges. Thus, effective Ambiguity Detection and Coreference Resolution are key to task success in Conversational AI. In this paper, we present models for these two tasks as part of the SIMMC 2.0 Challenge (Kottur et al. 2021). Specifically, we use TOD-BERT and LXMERT based models, compare them to a number of baselines and provide ablation experiments. Our results show that (1) language models are able to exploit correlations in the data to detect ambiguity; and (2) unimodal coreference resolution models can avoid the need for a vision component, through the use of smart object representations.

artificial intelligence, machine learning, natural language, (17 more...)

arXiv.org Artificial Intelligence

2202.12645

Country:

North America > United States > Minnesota > Hennepin County > Minneapolis (0.14)
Europe > United Kingdom > Scotland > City of Edinburgh > Edinburgh (0.04)
Europe > Sweden > Vaestra Goetaland > Gothenburg (0.04)
(2 more...)

Genre: Research Report > New Finding (1.00)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
Information Technology > Artificial Intelligence > Natural Language (1.00)
Information Technology > Artificial Intelligence > Machine Learning (1.00)

Add feedback

SimpleMTOD: A Simple Language Model for Multimodal Task-Oriented Dialogue with Symbolic Scene Representation

Hemanthage, Bhathiya, Dondrup, Christian, Bartie, Phil, Lemon, Oliver

arXiv.org Artificial IntelligenceJul-10-2023

SimpleMTOD is a simple language model which recasts several sub-tasks in multimodal task-oriented dialogues as sequence prediction tasks. SimpleMTOD is built on a large-scale transformer-based auto-regressive architecture, which has already proven to be successful in uni-modal task-oriented dialogues, and effectively leverages transfer learning from pre-trained GPT-2. In-order to capture the semantics of visual scenes, we introduce both local and de-localized tokens for objects within a scene. De-localized tokens represent the type of an object rather than the specific object itself and so possess a consistent meaning across the dataset. SimpleMTOD achieves a state-of-the-art BLEU score (0.327) in the Response Generation sub-task of the SIMMC 2.0 test-std dataset while performing on par in other multimodal sub-tasks: Disambiguation, Coreference Resolution, and Dialog State Tracking. This is despite taking a minimalist approach for extracting visual (and non-visual) information. In addition the model does not rely on task-specific architectural changes such as classification heads.

large language model, machine learning, natural language, (21 more...)

arXiv.org Artificial Intelligence

2307.04907

Country:

North America > United States > California > San Diego County > San Diego (0.04)
North America > Dominican Republic (0.04)

Genre: Research Report (0.40)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Large Language Model (0.88)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.70)
Information Technology > Artificial Intelligence > Natural Language > Discourse & Dialogue (0.67)

Add feedback

Which One Are You Referring To? Multimodal Object Identification in Situated Dialogue

Lovenia, Holy, Cahyawijaya, Samuel, Fung, Pascale

arXiv.org Artificial IntelligenceMar-15-2023

The demand for multimodal dialogue systems has been rising in various domains, emphasizing the importance of interpreting multimodal inputs from conversational and situational contexts. We explore three methods to tackle this problem and evaluate them on the largest situated dialogue dataset, SIMMC 2.1. Our best method, scene-dialogue alignment, improves the performance by ~20% F1-score compared to the SIMMC 2.1 baselines. We provide analysis and discussion regarding the limitation of our methods and the potential directions for future works. Our code is publicly available at https://github.com/holylovenia/multimodal-object-identification.

artificial intelligence, machine learning, natural language, (19 more...)

arXiv.org Artificial Intelligence

2302.1468

Country:

North America > United States > Minnesota > Hennepin County > Minneapolis (0.14)
Asia > Japan > Honshū > Kantō > Tokyo Metropolis Prefecture > Tokyo (0.14)
Asia > China > Hong Kong (0.05)
(8 more...)

Genre:

Research Report (0.50)
Personal > Interview (0.40)

Technology:

Information Technology > Artificial Intelligence > Vision (1.00)
Information Technology > Artificial Intelligence > Natural Language (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.69)

Add feedback

Multimodal Interactions Using Pretrained Unimodal Models for SIMMC 2.0

Lee, Joosung, Han, Kijong

arXiv.org Artificial IntelligenceDec-18-2021

This paper presents our work on the Situated Interactive MultiModal Conversations 2.0 challenge held at Dialog State Tracking Challenge 10. SIMMC 2.0 includes 4 subtasks, and we introduce our multimodal approaches for the subtask \#1, \#2 and the generation of subtask \#4. SIMMC 2.0 dataset is a multimodal dataset containing image and text information, which is more challenging than the problem of only text-based conversations because it must be solved by understanding the relationship between image and text. Therefore, since there is a limit to solving only text models such as BERT or GPT2, we propose a multimodal model combining image and text. We first pretrain the multimodal model to understand the relationship between image and text, then finetune our model for each task. We achieve the 3rd best performance in subtask \#1, \#2 and a runner-up in the generation of subtask \#4. The source code is available at https://github.com/rungjoo/simmc2.0.

subtask, system utterance, utterance, (16 more...)

arXiv.org Artificial Intelligence

2112.05328

Country:

Europe > Spain > Catalonia > Barcelona Province > Barcelona (0.04)
Europe > Italy > Tuscany > Florence (0.04)

Genre: Research Report (0.50)

Technology:

Information Technology > Artificial Intelligence > Machine Learning (1.00)
Information Technology > Artificial Intelligence > Natural Language > Discourse & Dialogue (0.68)

Add feedback

SIMMC 2.0: A Task-oriented Dialog Dataset for Immersive Multimodal Conversations

Kottur, Satwik, Moon, Seungwhan, Geramifard, Alborz, Damavandi, Babak

arXiv.org Artificial IntelligenceApr-17-2021

We present a new corpus for the Situated and Interactive Multimodal Conversations, SIMMC 2.0, aimed at building a successful multimodal assistant agent. Specifically, the dataset features 11K task-oriented dialogs (117K utterances) between a user and a virtual assistant on the shopping domain (fashion and furniture), grounded in situated and photo-realistic VR scenes. The dialogs are collected using a two-phase pipeline, which first generates simulated dialog flows via a novel multimodal dialog simulator we propose, followed by manual paraphrasing of the generated utterances. In this paper, we provide an in-depth analysis of the collected dataset, and describe in detail the four main benchmark tasks we propose for SIMMC 2.0. The preliminary analysis with a baseline model highlights the new challenges that the SIMMC 2.0 dataset brings, suggesting new directions for future research. Our dataset and code will be made publicly available.

dialog, simmc 2, utterance, (15 more...)

arXiv.org Artificial Intelligence

2104.08667

Country:

Asia (0.04)
North America > United States > Pennsylvania > Philadelphia County > Philadelphia (0.04)
North America > United States > New York (0.04)

Genre: Research Report (0.40)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Discourse & Dialogue (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning (0.90)
Information Technology > Artificial Intelligence > Machine Learning > Performance Analysis > Accuracy (0.46)

Add feedback